System and method for generating manually designed and automatically optimized spoken dialog systems

ABSTRACT

Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for generating a natural language spoken dialog system. The method includes nominating a set of allowed dialog actions and a set of contextual features at each turn in a dialog, and selecting an optimal action from the set of nominated allowed dialog actions using a machine learning algorithm. The method includes generating a response based on the selected optimal action at each turn in the dialog. The set of manually nominated allowed dialog actions can incorporate a set of business rules. Prompt wordings in the generated natural language spoken dialog system can be tailored to a current context while following the set of business rules. A compression label can represent at least one of the manually nominated allowed dialog actions.

PRIORITY

The present application is a continuation of U.S. patent applicationSer. No. 15/185,304, filed Jun. 17, 2016, which is a continuation ofU.S. patent application Ser. No. 14/617,172, filed Feb. 9, 2015, nowU.S. Pat. No. 9,373,323, issued Jun. 21, 2016, which is a continuationof U.S. patent application Ser. No. 14/338,550, filed Jul. 23, 2014, nowU.S. Pat. No. 8,954,319, issued Feb. 10, 2015, which is a continuationof U.S. patent application Ser. No. 12/501,925, filed Jul. 13, 2009, nowU.S. Pat. No. 8,793,119, issued Jul. 29, 2014, the contents of which areincorporated herein by reference in their entirety.

BACKGROUND 1. Technical Field

The present disclosure relates to spoken dialog systems and morespecifically to combining manual design of spoken dialog systems with anautomatic learning approach.

2. Introduction

The development of interactive computer systems is expensive andtime-consuming. Further, the user interface to such systems poses asignificant challenge. Despite years of research, speech recognitiontechnology is far from perfect, and speech recognition errors remain acentral problem for the user interface. Misunderstanding the user'sspeech causes the system to get off track and often leads to faileddialogs.

Two approaches are commonly used for generating spoken dialog systems,the conventional approach and the automatic learning approach. Theconventional or manual design approach is often used in commercial orindustrial settings. Such commercial systems have a manually designedcomputer program controlling the flow of the conversation. A dialogdesigner can tailor all the prompts to say exactly what she wants.Because a computer program controls the dialog flow, a designer canmodify the computer program to encode business rules. Some examples ofbusiness rules include always confirm money transfer with a yes/noquestion and never display account info unless the corresponding useraccount is verified. A dialog designer must generate detailed flowcharts outlining the possible branches in the conversation. These flowcharts can be incredibly large and complicated (i.e. hundreds ofMicrosoft Visio pages) because conversations are temporal. At everypoint, the person can say something different, so the tree iscomplicated with lots of branches and loops. A designer typicallyignores a lot of state information, history, and dialog details tosimplify these complicated trees. As such, manually designed systems arenot very robust to speech recognition errors.

The automatic learning approach uses machine learning and optimizationto design the dialog system. Instead of specifying when the systemshould take a certain action as in the conventional approach set forthabove, the system selects an action from a palette of possible actions.For example, in an airline dialog system, the system can say “Where doyou want to fly from?”, “Where do you want to fly to?”, “OK, you want tofly to Phoenix.”, confirm the date or flight class, print a ticket, etc.The optimization procedure is unconstrained regarding the order ordependencies between variables and may take any action at any time. Theautomatic learning approach interacts with a user simulation and employsreinforcement learning to try out all the different sequences of actionsin order to come up with a dialog plan. This approach still requires alot of work, but the dialog plan is more robust and detailed. The dialogsystem is not bounded by what the designer can hold in her head orexpress in numerous Visio pages. The dialog becomes an optimizationproblem that a computer can solve with as much detail as desired.

However, both of these approaches have shortcomings. Automatic learningdoes not provide a good way to express business rules in this contextbecause the system can take any of the actions at any time. Theautomatic learning approach also encounters difficulty knowing how totailor prompts appropriately. For example, the system knows that thereis a certain way of asking “where are you flying to”, or on what date,but has a hard time knowing how and when to say things like, “Oh, sorry.Where do you want to fly from?” Those joining words and phrases andintonations designed to elicit just the right response from users aredifficult to generate in this approach because the system just knows thegeneral form of the question but not how to tailor that question todifferent situations. Accordingly, what is needed in the art is animproved way to blend the strengths of the conventional and automaticlearning approaches while minimizing their shortcomings.

SUMMARY

Additional features and advantages of the disclosure will be set forthin the description which follows, and in part will be obvious from thedescription, or can be learned by practice of the herein disclosedprinciples. The features and advantages of the disclosure can berealized and obtained by means of the instruments and combinationsparticularly pointed out in the appended claims. These and otherfeatures of the disclosure will become more fully apparent from thefollowing description and appended claims, or can be learned by thepractice of the principles set forth herein.

Disclosed are systems, computer-implemented methods, and tangiblecomputer-readable storage media for generating a natural language spokendialog system. The method includes nominating a set of allowed dialogactions and a set of contextual features at each turn in a dialog, andselecting an optimal action from the set of nominated allowed dialogactions using a machine learning algorithm. The method also includesgenerating a response based on the selected optimal action at each turnin the dialog or generating a spoken dialog system based on the processof selecting optimal actions at each dialog turn. The set of manuallynominated allowed dialog actions can incorporate a set of businessrules. Prompt wordings in the generated natural language spoken dialogsystem can be tailored to a current context while following the set ofbusiness rules. To facilitate optimization by the machine learningalgorithm, a compression label can represent at least one of themanually nominated allowed dialog actions.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to describe the manner in which the above-recited and otheradvantages and features of the disclosure can be obtained, a moreparticular description of the principles briefly described above will berendered by reference to specific embodiments thereof which areillustrated in the appended drawings. Understanding that these drawingsdepict only exemplary embodiments of the disclosure and are nottherefore to be considered to be limiting of its scope, the principlesherein are described and explained with additional specificity anddetail through the use of the accompanying drawings in which:

FIG. 1 illustrates an example system embodiment;

FIG. 2 illustrates an example method embodiment;

FIG. 3 illustrates sample task completion rates;

FIG. 4 illustrates a standard deviation of the average total reward perdialog; and

FIG. 5 illustrates an example conversation and operation of the methodin detail.

DETAILED DESCRIPTION

Various embodiments of the disclosure are discussed in detail below.While specific implementations are discussed, it should be understoodthat this is done for illustration purposes only. A person skilled inthe relevant art will recognize that other components and configurationsmay be used without parting from the spirit and scope of the disclosure.

With reference to FIG. 1, an exemplary system 100 includes ageneral-purpose computing device 100, including a processing unit (CPUor processor) 120 and a system bus 110 that couples various systemcomponents including the system memory 130 such as read only memory(ROM) 140 and random access memory (RAM) 150 to the processor 120. Theseand other modules can be configured to control the processor 120 toperform various actions. Other system memory 130 may be available foruse as well. It can be appreciated that the disclosure may operate on acomputing device 100 with more than one processor 120 or on a group orcluster of computing devices networked together to provide greaterprocessing capability. The processor 120 can include any general purposeprocessor and a hardware module or software module, such as module 1162, module 2 164, and module 3 166 stored in storage device 160,configured to control the processor 120 as well as a special-purposeprocessor where software instructions are incorporated into the actualprocessor design. The processor 120 may essentially be a completelyself-contained computing system, containing multiple cores orprocessors, a bus, memory controller, cache, etc. A multi-core processormay be symmetric or asymmetric.

The system bus 110 may be any of several types of bus structuresincluding a memory bus or memory controller, a peripheral bus, and alocal bus using any of a variety of bus architectures. A basicinput/output (BIOS) stored in ROM 140 or the like, may provide the basicroutine that helps to transfer information between elements within thecomputing device 100, such as during start-up. The computing device 100further includes storage devices 160 such as a hard disk drive, amagnetic disk drive, an optical disk drive, tape drive or the like. Thestorage device 160 can include software modules 162, 164, 166 forcontrolling the processor 120. Other hardware or software modules arecontemplated. The storage device 160 is connected to the system bus 110by a drive interface. The drives and the associated computer readablestorage media provide nonvolatile storage of computer readableinstructions, data structures, program modules and other data for thecomputing device 100. In one aspect, a hardware module that performs aparticular function includes the software component stored in a tangibleand/or intangible computer-readable medium in connection with thenecessary hardware components, such as the processor 120, bus 110,display 170, and so forth, to carry out the function. The basiccomponents are known to those of skill in the art and appropriatevariations are contemplated depending on the type of device, such aswhether the device 100 is a small, handheld computing device, a desktopcomputer, or a computer server.

Although the exemplary embodiment described herein employs the hard disk160, it should be appreciated by those skilled in the art that othertypes of computer readable media which can store data that areaccessible by a computer, such as magnetic cassettes, flash memorycards, digital versatile disks, cartridges, random access memories(RAMs) 150, read only memory (ROM) 140, a cable or wireless signalcontaining a bit stream and the like, may also be used in the exemplaryoperating environment. Tangible computer-readable storage mediaexpressly exclude media such as energy, carrier signals, electromagneticwaves, and signals per se.

To enable user interaction with the computing device 100, an inputdevice 190 represents any number of input mechanisms, such as amicrophone for speech, a touch-sensitive screen for gesture or graphicalinput, keyboard, mouse, motion input, speech and so forth. The inputdevice 190 may be used by the presenter to indicate the beginning of aspeech search query. An output device 170 can also be one or more of anumber of output mechanisms known to those of skill in the art. In someinstances, multimodal systems enable a user to provide multiple types ofinput to communicate with the computing device 100. The communicationsinterface 180 generally governs and manages the user input and systemoutput. There is no restriction on operating on any particular hardwarearrangement and therefore the basic features here may easily besubstituted for improved hardware or firmware arrangements as they aredeveloped.

For clarity of explanation, the illustrative system embodiment ispresented as including individual functional blocks including functionalblocks labeled as a “processor” or processor 120. The functions theseblocks represent may be provided through the use of either shared ordedicated hardware, including, but not limited to, hardware capable ofexecuting software and hardware, such as a processor 120, that ispurpose-built to operate as an equivalent to software executing on ageneral purpose processor. For example the functions of one or moreprocessors presented in FIG. 1 may be provided by a single sharedprocessor or multiple processors. (Use of the term “processor” shouldnot be construed to refer exclusively to hardware capable of executingsoftware.) Illustrative embodiments may include microprocessor and/ordigital signal processor (DSP) hardware, read-only memory (ROM) 140 forstoring software performing the operations discussed below, and randomaccess memory (RAM) 150 for storing results. Very large scaleintegration (VLSI) hardware embodiments, as well as custom VLSIcircuitry in combination with a general purpose DSP circuit, may also beprovided.

The logical operations of the various embodiments are implemented as:(1) a sequence of computer implemented steps, operations, or proceduresrunning on a programmable circuit within a general use computer, (2) asequence of computer implemented steps, operations, or proceduresrunning on a specific-use programmable circuit; and/or (3)interconnected machine modules or program engines within theprogrammable circuits. The system 100 shown in FIG. 1 can practice allor part of the recited methods, can be a part of the recited systems,and/or can operate according to instructions in the recited tangiblecomputer-readable storage media. Generally speaking, such logicaloperations can be implemented as modules configured to control theprocessor 120 to perform particular functions according to theprogramming of the module. For example, FIG. 1 illustrates three modulesMod1 162, Mod2 164 and Mod3 166 which are modules configured to controlthe processor 120. These modules may be stored on the storage device 160and loaded into RAM 150 or memory 130 at runtime or may be stored aswould be known in the art in other computer-readable memory locations.

Having disclosed some basic system components, the disclosure now turnsto background material relevant for understanding the method. First thedisclosure describes how dialog systems operate in general.

At each turn in a dialog, the dialog system takes a speech action A,such as “Where are you leaving from?” A user then responds with actionU, such as “Boston”. The speech recognition engine processes this U toproduce an observation O, such as “AUSTIN”. The dialog system examinesO, updates its internal state, and outputs another A. There are commonapproaches used for generating spoken dialog systems: the conventionalapproach and the automatic learning approach. The conventional andautomatic approaches differ in how they maintain internal state, and howthey choose actions given the state.

A conventional dialog manager maintains a state N such as a form orframe and relies on two functions for control, G and F. For a givendialog state N, G(N)=A decides which system action to output, and thenafter observation O has been received, F(N, O)=N′ decides how to updatethe dialog state N to yield N′. This process repeats until the dialog isover. G and F are written by hand, for example in a language such asVoiceXML.

Next is described how an automatic approach operates. For clarity ofexposition, one automatic approach in particular is chosen, called apartially observable Markov decision process (POMDP), to use in thisdescription. However those skilled in the art will recognize that otherautomatic approaches could be used in place of a POMDP.

Unlike the conventional approach, a POMDP tracks a probabilitydistribution over many dialog states. In the POMDP, there are a set ofhidden states, where each hidden state s represents a possible state ofthe conversation, including quantities such as the user's action U, theuser's underlying goals, and the dialog history. Because the true stateof the conversation isn't known, the POMDP maintains a belief state(probability distribution) over these hidden states, B, where B(S) isthe belief (probability) that S is the true state. By adding models ofhow the hidden state changes and how the observation is corrupted, it isstraightforward to update this distribution—i.e., B′(S′)=P(S′IA, O, B).The system 100 can employ various methods for doing this efficientlyand/or approximately. The belief state has the desirable property ofaccumulating information across all of the actions and observations overthe course of the entire dialog history, and provides robustness tospeech recognition errors.

In principle, a developer could write a function to choose actionsG(B)=A, but in practice it is extremely difficult for a person to seehow to make use of all of the information in the belief state,especially with even a moderately complex dialog. Instead, reinforcementlearning is applied, in which a developer specifies high-level goals inthe form of a reward function, R(S, A). R assigns a measure of goodnessto each state/action pair and communicates, for example, the relativevalues of short dialogs and successful task completion. An optimizationprocedure then searches for the best action to take in each belief statein order to maximize the sum of rewards over the whole dialog. Theresult is a value function Q(B, A), which estimates the long-term rewardof taking action a at belief state B. The optimal action in belief stateB is then A*=argmax_(A) Q(B, A).

In practice, the domain of Q(B, A) is too large and compression isapplied. One method is the so-called “summary” method. The intuition isto map B and A into lower-dimensional feature vectors {circumflex over(B)} and Â, and to estimate a value function {circumflex over (Q)}({circumflex over (B)}, Â) in this compressed space. For example,{circumflex over (B)} can reduce a distribution over all cities into theprobability of only the most likely city, and Â can compress the classof confirm actions (confirm(london), confirm(boston)) into a singleconfirm(most-likely-city).

The method described herein unifies these two approaches by makingseveral changes. The method extends the conventional dialog manager inat least one of three respects. First, its action selection functionG(N)=A is changed to output a set of one or more (M) allowable actionsgiven a dialog state N, each with a corresponding summary action,G(N)={(A₍₁₎, Â₍₁₎, . . . , (A_((M)), Â_((M)))}. Next, its transitionfunction F(N, O)=N′ is extended to allow for different transitionsdepending on which of these action was taken, and it is also givenaccess to the resulting POMDP belief state, F(N, A, O, B′)=N′. A humandialog designer still designs the contents of the state N and writes thefunctions G and F.

For action selection, the system 100 can apply compression but the statefeatures used for action selection will be a function of both the beliefstate B and the dialog state N. This state feature vector is written{circumflex over (X)} and is computed by a feature-function H(B,N)={circumflex over (X)}. The POMDP value function is correspondinglyre-cast to assign values to these feature vectors, {circumflex over(Q)}({circumflex over (X)}, Â).

The unified dialog manager operates as follows. At each time-step, thedialog manager is in state N and the POMDP is in belief state B. Thedialog manager nominates a set of M allowable actions, where each actionA(M) includes its summarized counterpart Â (M). The state features arecomputed as {circumflex over (X)}=H(B, N). Then, the POMDP valuefunction {circumflex over (Q)} ({circumflex over (X)}, Â) is evaluatedfor only those actions nominated by the dialog manager (not allactions), and the index M* of the action that maximizes the POMDP valuefunction is returned:

$\begin{matrix}{M^{*} = {\arg \; {\max\limits_{M \in {\lbrack{1,M}\rbrack}}\; {\overset{\Cap}{Q}( {\overset{\Cap}{X},{\overset{\Cap}{A}}_{M}} )}}}} & ( {{Equation}\mspace{14mu} 1} )\end{matrix}$

The system 100 outputs action A_(M*) and receives reward R andobservation O. The POMDP updates its belief state B′(S′)=P(S′IA_(M*), O,B) and the dialog manager transitions to dialog state N′=F(N, A_(M*), O,B′). An example of this process taken from the real system describedbelow is shown in FIG. 5.

In this method, action selection can be viewed as a generalreinforcement learning problem, where states are feature vectors X. Thisenables any general-purpose reinforcement learning technique to beapplied, such as value function approximation, in either an off-line oron-line setting. The only requirement is that the learning techniqueproduces an estimate of {circumflex over (Q)}({circumflex over (X)}, Â).Moreover, intuitively, the effect of constraining which actions areavailable prunes the space of policies, so if the constraints arewell-informed, then optimization ought to converge to the optimal policyfaster.

One example of the method embodiment is shown in FIG. 2. This generallyrefers to a method for generating a natural language spoken dialoguesystem or a component of such a system. The system 100 receives a set ofmanually nominated set of allowed dialog actions and a set of contextualfeatures at each turn in a dialogue (202). Next, the method includesselection of an optimal action from the set of nominated allowed dialogactions using a machine learning algorithm (204). In one aspect, themethod also includes generating a response based on the selected optimalaction at each turn in the dialog. (206). In another aspect, this laststep involves generating a spoken dialog system based on the process ofoptimal actions at each dialog turn.

This method has been tested on an existing voice dialer applicationwithin the AT&T research lab which receives daily calls. The dialer'svocabulary consists of 50,000 AT&T employees. Since many employees havethe same name, the dialer can disambiguate by asking for the calledparty's location. The dialer can also disambiguate between multiplephone listings for the same person (office and mobile) and can indicatewhen a called party has no number listed. This dialog manager tracks avariety of elements in its state N, including the most recentlyrecognized called party, how many called parties share that name,whether the called party has been confirmed, and many others. Thisexisting dialer was used as our baseline, labeled as HC 306 in FIG. 3.

The experiment then created the POMDP. The belief state followed a modelknown as the SDS-POMDP model, and maintained a belief state over allcalled parties. The user model and speech recognition models used toupdate the belief state were based on based on transcribed logs from 320calls.

The experiment extended the existing dialer in at least two respects.First, rather than tracking the most recently recognized called partiesor callees, it instead obtained the most likely callee from the POMDPbelief state. Second, it was altered to nominate a set of one or moreallowable actions using knowledge about this domain. For example, on thefirst turn of the dialog, the only allowed action was to ask for thecallee's name. Once a callee has been recognized, the system can querythe callee or confirm received information. Additional actions areallowed depending on the properties of the most likely callee. Forexample, if the top callee is ambiguous, then asking for the callee'scity and state is allowed. If the top callee has both a cellphone andoffice phone listed, then asking for the type of phone is allowed. Thetransfer action is permitted only after the system has attemptedconfirmation. This unified controller was called “HC+POMDP” 408 in FIG.4.

Because the actions were nominated by the hand-crafted dialog manager,tailoring the prompt wordings to the dialog context was straightforward.For example, the first request for the callee's name was “First and lastname”, whereas the second was “Sorry, name please?” These both appearedto the planning algorithm as the summary action Â=AskName. Also, when acallee's name is ambiguous, it ought to be confirmed with the callee'slocation or, in other settings, with some other piece of uniqueinformation.

For comparison, another controller was created which nominated everyaction at every time-step. Its actions also acted on the most likelycallee in the belief state but no other restrictions were imposed. Itcould, for example, transfer a call to a callee who has not beenconfirmed, or ask for the city and state even if the top callee was notambiguous. This controller was called “POMDP” 406 in FIG. 4.

For optimization, the state features include at least two continuousfeatures and several discrete features. The continuous features aretaken from the belief state and are the probability that the top calleeis correct, and the probability that the top callee's type of phone(office or cell) is correct. The discrete features are how many phonetypes the top callee has (none, one, two), whether the top callee isambiguous (yes, no), and whether confirmation has yet been requested forthe top callee (yes, no).

Finally, a simple reward function was created which assigns −1 persystem action plus +/−20 for correctly/incorrectly transferring thecaller at the end of the call.

Optimization was performed on “POMDP” 406 and “HC+POMDP” 408 usingdialog simulation with the user and ASR models estimated from the 320transcribed calls. The optimization method roughly follows summarypoint-based value iteration. Space limitations preclude a completedescription. K synthetic dialogs were generated by randomly choosingallowed actions. The space of state features was quantized into smallregions, and a transition and reward function over these regions wereestimated by frequency counting, applying some smoothing to mitigatedata sparsity. The system 100 then applied straightforward valueiteration to the estimated transition and reward functions to produce avalue function {circumflex over (Q)}({circumflex over (X)}, Â). Theoptimization procedure, simulation environment, state features, andaction set were identical for “POMDP” 406 and “HC+POMDP” 408. The onlydifference was whether the set of allowed actions was constrained ornot.

Using the system described above, optimization was conducted for variousnumbers of K dialogs for “POMDP” 308 and “HC+POMDP” 310, ranging fromK=10 to K=10,000. After optimization, each policy was evaluated insimulation for 1000 dialogs to find the average return, average taskcompletion rate, and average dialog length. The simulation environmentfor optimization and evaluation were identical. For each value of K,this whole process (optimization and evaluation) was run 10 times, andthe results of the 10 runs were averaged. 1000 simulated dialogs werealso run with the baseline “HC” 306, using the same simulationenvironment.

Results 300 for task completion rate are shown in FIG. 3. As the numberof training dialogs 302 increases, performance of both POMDP 308 andHC+POMDP 310 increase to roughly the same asymptote. With sufficienttraining dialogs, both POMDP 308 and HC+POMDP 310 are able toout-perform the baseline on average task completion rate 304. However,HC+POMDP 310 reaches this asymptote with many fewer dialogs. Moreover,inspection of the 10 runs at each value of K showed that the HC+POMDP310 policies were significantly more consistent. FIG. 4 shows that thestandard deviation of the average total reward per dialog over the 10runs is lower for HC+POMDP 408 than for POMDP 406.

These results verify that incorporating a POMDP into a conventionaldialog system in accordance with the principles disclosed hereinincreases performance. Moreover, when compared to a pure POMDP, thismethod reduces training time and yields more consistent results vs. apure POMDP. In other words, not only does this approach combine thestrengths of the two methods, it can also reduce optimization time andproduce spurious policies less often.

One of the policies created using with this method trained on 10,000simulated dialogs was experimentally installed in our internal phonesystem. It uses up to 100 ASR N-best entries and maintains a dialog beamof up to 100 callers. Its response time of 2-3 seconds is essentiallyidentical to the baseline system. FIG. 5 shows an example conversationillustrating operation of the method in detail.

FIG. 5 illustrates an example conversation and operation 500 of theHC+POMDP dialog manager in detail. The first column 502 shows the POMDPbelief state B(S), which is a distribution over all possible callees.The second column 504 shows the conventional dialog state N, whichincludes the name of the most likely callee in the belief state, whetherthe top caller has been confirmed, and how many other callees share thetop callee's name. The third column 506 shows some of the state featuresin {circumflex over (X)}, including the belief in the top callee(extracted from B) and whether the callee's name is ambiguous orconfirmed (extracted from N). The fourth column 508 shows the actions A(and summary actions Â below) nominated by the conventional dialogmanager. Because the conventional dialog manager has been designed byhand, the system can tailor the prompts to the dialog context—forexample, confirming an ambiguous callee includes their location. Thefifth column 510 shows the value estimated by the optimization for eachsummary action given the current state features {circumflex over(Q)}({circumflex over (X)}, Â), and a box indicates the maximum{circumflex over (Q)} value, which is output in the sixth column 512,showing the dialog transcript. The entries in upper-case letters showthe results from the recognition process with confidence scores. Notethat after the second system turn, “Jason Williams” has appeared on theN-Best list twice, and as a result in the beginning of the third systemturn it has acquired the most mass in the belief state.

The disclosure herein presents a novel method to unify conventionaldialog design practices in industry with the emerging approach inresearch based on partially observable Markov decision processes(POMDPs). The POMDP belief state and the conventional dialog state runin parallel, and the conventional dialog manager is augmented so that itnominates a set of one or more acceptable actions. The POMDP thenchooses an action from this limited set. The method naturallyaccommodates compression akin to the “summary” method, and this enablesthe method to scale to non-trivial domains—here a voice dialerapplication covering 50,000 listings. Simulation experiments drawing onusage data from a real dialog system demonstrated that the methodoutperformed our existing baseline dialer, while simultaneouslyrequiring less training data than a classical POMDP. This method canplace POMDPs in a better position for use in commercial applications.

This disclosure teaches a method of building dialog systems by unifyingconventional practices and POMDPs to gain the benefits of both: thefine-grain control of the conventional approach, and the robustness toerrors of the POMDP. In one aspect, the conventional and POMDP systemsrun in parallel with several modifications. First, the conventionalsystem, which usually outputs a single action, is modified to output aset of one or more allowed actions. These allowed actions are specifiedat a detailed level (as in a conventional system), tailored to thecurrent dialog context. The idea is that each of the allowed actions ispermissible in the current context according to business rules,conversational norms, or other criteria, but the optimal action isn'tclear ahead of time to a developer or designer. For example, in a travelsystem, after an origin city has been recognized, the allowed actionsmight include re-asking the origin city or confirming it. Actions suchas printing a ticket might not be allowed because no destination hasbeen recognized yet, or because important terms and conditions haven'tbeen read to the caller.

This set of allowed actions is then passed to the POMDP. The classicalPOMDP formulation is to consider every possible action, however unlikelyor impossible. Instead, in this approach the POMDP chooses the bestaction within this restricted set of allowed actions. The POMDP doesthis by examining all of its hypotheses for the current dialog state aswell as metadata describing the current dialog state and/or previousdialog turns, and using these to assign a score to each allowed action.This score is a measure of suitability for the action given the currentset of hypotheses. The system returns the action with the highest scoreto the conventional dialog manager which plays it out to the user. Thisprocess then continues until the dialog terminates.

The POMDP scores are computed based on an optimization process, and inpractice the number of possible state hypotheses and system outputs istoo large to optimize directly. As a result, this approach also providesa method for performing optimization. First, along with each allowableaction output by the conventional dialog manager, an “action pneumonic”is also output. This pneumonic is a compressed version of the action.For example, there are hundreds of actions like “From New York?”, “FromBoston?”, and “From Salt Lake City?” which the system 100 can map to asingle compressed pneumonic “ConfirmOrigin”. In addition, the full listof dialog hypotheses is compressed to a set of state features. Forexample, the top hypothesis might be an itinerary from New York toBoston and have probability 0.6, and the state features might includethe likelihood of the most likely itinerary (0.6) but drop the actualcities. Crucially, these state features may also include elements fromthe traditional dialog state, such as whether any flights are availablefrom New York to Boston. The synthesis of the conventional and POMDPstates into a set features for optimization allows the optimization totake into account business logic. The principles described herein canprovide several benefits, including the creation of dialog systems whichare more robust to speech recognition errors. In the context oftelephone-based customer-care applications, this increased robustnesscan enable systems to achieve higher task completion rates. Moreover,because user satisfaction is highly correlated with how well theybelieves they are understood, these higher task completion rates arecoupled with an increase in customer satisfaction. The addition of thePOMDP reduces the chances of a user “getting stuck” by a speechrecognition error, and they are more likely to accomplish their goalsuccessfully. This approach can enable previously infeasible customercare applications for tasks such as troubleshooting or mobile phoneconfiguration.

The principles disclosed herein can also have application in newcontexts such as a speech-enabled electronic program guide (EPG) fortelevision. The family room environment is likely to be noisy, and thusspeech recognition errors will be common. The added robustness coupledwith a full implementation of business policies and the expertise ofdialog designers can be an important enabler in this space. Otherapplications of these principles include mobile device directoryassistance, such as Yellow Pages searches, or multi-modal interactionson devices such as the iPhone in a variety of challenging out-of-homeenvironments like cars, trains, and airports. Industries such asutilities, health care, airlines, and government would benefit from theprinciples disclosed herein.

Embodiments within the scope of the present disclosure may also includetangible computer-readable storage media for carrying or havingcomputer-executable instructions or data structures stored thereon. Suchcomputer-readable storage media can be any available media that can beaccessed by a general purpose or special purpose computer, including thefunctional design of any special purpose processor as discussed above.By way of example, and not limitation, such computer-readable media caninclude RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magneticdisk storage or other magnetic storage devices, or any other mediumwhich can be used to carry or store desired program code means in theform of computer-executable instructions, data structures, or processorchip design. When information is transferred or provided over a networkor another communications connection (either hardwired, wireless, orcombination thereof) to a computer, the computer properly views theconnection as a computer-readable medium. Thus, any such connection isproperly termed a computer-readable medium. Combinations of the aboveshould also be included within the scope of the computer-readable media.

Computer-executable instructions include, for example, instructions anddata which cause a general purpose computer, special purpose computer,or special purpose processing device to perform a certain function orgroup of functions. Computer-executable instructions also includeprogram modules that are executed by computers in stand-alone or networkenvironments. Generally, program modules include routines, programs,components, data structures, objects, and the functions inherent in thedesign of special-purpose processors, etc. that perform particular tasksor implement particular abstract data types. Computer-executableinstructions, associated data structures, and program modules representexamples of the program code means for executing steps of the methodsdisclosed herein. The particular sequence of such executableinstructions or associated data structures represents examples ofcorresponding acts for implementing the functions described in suchsteps.

Those of skill in the art will appreciate that other embodiments of thedisclosure may be practiced in network computing environments with manytypes of computer system configurations, including personal computers,hand-held devices, multi-processor systems, microprocessor-based orprogrammable consumer electronics, network PCs, minicomputers, mainframecomputers, and the like. Embodiments may also be practiced indistributed computing environments where tasks are performed by localand remote processing devices that are linked (either by hardwiredlinks, wireless links, or by a combination thereof) through acommunications network. In a distributed computing environment, programmodules may be located in both local and remote memory storage devices.

The various embodiments described above are provided by way ofillustration only and should not be construed to limit the scope of thedisclosure. Those skilled in the art will readily recognize variousmodifications and changes that may be made to the principles describedherein without following the example embodiments and applicationsillustrated and described herein, and without departing from the spiritand scope of the disclosure.

We claim:
 1. A method comprising: nominating, via a processor configuredto use a partially observable Markov decision process in parallel with aconventional dialog state, a set of contextual features; and generatingan audible response in a dialog between a user and a spoken dialogsystem based at least in part on the set of contextual features.
 2. Themethod of claim 1, further comprising: nominating, via the processorconfigured to use the partially observable Markov decision process inparallel with the conventional dialog state, a set of dialog actions;and generating the audible response based on the set of dialog actions.3. The method of claim 2, further comprising generating the audibleresponse based on the set of dialog actions and via a machine learningalgorithm.
 4. The method of claim 3, further comprising augmenting themachine learning algorithm using reinforcement learning.
 5. The methodof claim 4, wherein the machine learning algorithm augmented by thereinforcement learning is based on the partially observable Markovdecision process.
 6. The method of claim 3, further comprising:assigning a reward to the set of dialog actions as part of the machinelearning algorithm.
 7. The method of claim 1, further comprisingtailoring wordings in a spoken dialog system associated with the dialogbased on a current context and a set of business rules.
 8. A systemcomprising: a processor configured to use a partially observable Markovdecision process in parallel with a conventional dialog state; and acomputer-readable storage medium having instructions stored which, whenexecuted by the processor, cause the processor to perform operationscomprising: nominating a set of contextual features; and generating anaudible response in a dialog between a user and a spoken dialog systembased at least in part on the set of contextual features.
 9. The systemof claim 8, further comprising: nominating, via the processor configuredto use the partially observable Markov decision process in parallel withthe conventional dialog state, a set of dialog actions; and generatingthe audible response based on the set of dialog actions.
 10. The systemof claim 9, wherein the computer-readable storage medium storesadditional instructions stored which, when executed by the processor,cause the processor to perform operations further comprising: generatingthe audible response based on the set of dialog actions and via amachine learning algorithm.
 11. The system of claim 10, wherein thecomputer-readable storage medium stores additional instructions storedwhich, when executed by the processor, cause the processor to performoperations further comprising: augmenting the machine learning algorithmusing reinforcement learning.
 12. The system of claim 11, wherein themachine learning algorithm augmented by the reinforcement learning isbased on the partially observable Markov decision process.
 13. Thesystem of claim 10, wherein the computer-readable storage medium storesadditional instructions stored which, when executed by the processor,cause the processor to perform operations further comprising: assigninga reward to the set of dialog actions as part of the machine learningalgorithm.
 14. The system of claim 8, wherein the computer-readablestorage medium stores additional instructions stored which, whenexecuted by the processor, cause the processor to perform operationsfurther comprising: tailoring wordings in a spoken dialog systemassociated with the dialog based on a current context and a set ofbusiness rules.
 15. A computer-readable storage device havinginstructions stored which, when executed by a processor configured touse a partially observable Markov decision process in parallel with aconventional dialog state, cause the processor to perform operationscomprising: nominating a set of contextual features; and generating anaudible response in a dialog between a user and a spoken dialog systembased at least in part on the set of contextual features.
 16. Thecomputer-readable storage device of claim 15, wherein thecomputer-readable storage device stores additional instructions storedwhich, when executed by the processor, cause the processor to performoperations further comprising: nominating, via the processor configuredto use the partially observable Markov decision process in parallel withthe conventional dialog state, a set of dialog actions; and generatingthe audible response based on the set of dialog actions.
 17. Thecomputer-readable storage device of claim 16, wherein thecomputer-readable storage device stores additional instructions storedwhich, when executed by the processor, cause the processor to performoperations further comprising: generating the audible response based onthe set of dialog actions and via a machine learning algorithm.
 18. Thecomputer-readable storage device of claim 17, further comprisingaugmenting the machine learning algorithm using reinforcement learning.19. The computer-readable storage device of claim 18, wherein themachine learning algorithm augmented by the reinforcement learning isbased on the partially observable Markov decision process.
 20. Thecomputer-readable storage device of claim 17, further comprising:assigning a reward to the set of dialog actions as part of the machinelearning algorithm.